EM for Perceptual Coding and Reinforcement Learning Tasks
نویسندگان
چکیده
The paper presents an algorithm for an EM-based reinforcement-driven clustering. As shown here it is applicable to the reinforcement learning setting with continuous state/discrete action space. E-step of the algorithm computes the posterior given the data and the reinforcement. Although designed to discover intrinsic states, the algorithm performs action selection without explicit state identification. Learning algorithms are an important area of research in intelligent robotic systems. Intelligence requires that the system be able to adapt to its environment. One particularly difficult aspect of adaptation is the problem of selective attention and the ability of the system to generalize. Efficient generalization in the space of observations leads to emergence of observation states and results in more efficient action selection algorithms. In this paper we present our work on an EM-based reinforcement learning algorithm which allows for action selection in the presence of a hidden state. The algorithm provides a probabilistic framework that unifies the state identification task with action selection. The algorithm helps an agent learn a simple association task given sensory data and a discrete set of actions the agent needs to find a mapping from observations to actions so that the reward is maximized. We approach this problem via a set of latent variables. For the purposes of this paper we restrict our attention to the case where the number of states is known. In the future we plan to estimate that from data as well. 1 Related work A similar problem of reinforcement-driven clustering was addressed by Likas in [4]. The reinforcement learning algorithm was combined with a neural network to improve vector quantization in the input space. In contrast, this paper extends the Expectation-Maximization (EM) algorithm first introduced in [2]. The problem of action selection in the presence of a hidden state is developed in [5]. In our current experiments we are not concerned with the reinforcement learning itself, but rather we present a useful connection between an un-supervised EM algorithm and any sort of traditional state-based reinforcement learning algorithm. However, we are planning to fully explore the reinforcement learning algorithms in our future work. For these purposes [3] and [7] provide an overview of the field of reinforcement learning. An application of multiple-model Q-learning and a POMDP model in computer vision for selective attention was introduced by Darrell and Pentland in [1] which uses this model for gesture recognition. 2 EM for perceptual coding State identification is necessary in many traditional reinforcement learning algorithms. In continuous perceptual spaces there often exist task-dependent natural perceptual categories. This paper is concerned with discovery of such perceptual categories while allowing the agent to perform the action selection in the presence of unknown perceptual states. Without the task context such categories can be efficiently found in a framework of density estimation by an Expectation Maximization algorithm with a mixture distribution. We assume that observations come from a mixture, each tied to a state s, p(x|s), weighted by corresponding prior state probabilities, p(s): 1 We want to use this algorithm as a part of a learning system of a synthetic creature.
منابع مشابه
Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents
In this paper, we consider the problem of reinforcement learning in spatial tasks. These tasks have many states that can be aggregated together to improve learning efficiency. In an agent, this aggregation can take the form of selecting appropriate perceptual processes to arrive at a qualitative abstraction of the underlying continuous state. However, for arbitrary problems, an agent is unlikel...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملOn-Line EM Reinforcement Learning
In this article, we propose a new reinforcement learning (RL) method for a system having continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, which is the expected future return for the current state-action pair. The actor tries to approximate a stochastic soft-max policy defined by the Q-function. The ...
متن کاملReinforcement Learning of Active Recognition Behaviors
We show how a concise representation of active recognition behavior– what observations to make to detect a given object– can be derived from hidden-state reinforcement learning techniques. These learning techniques can solve decision process tasks which include perceptual observations, defined formally as Partially Observable Markov Decision Processes (POMDP). We define recognition within a POM...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000